Skip to content

fix(physical-optimizer): make OutputRequirements idempotent#22522

Merged
kosiew merged 1 commit into
apache:mainfrom
wirybeaver:fix-output-requirements-idempotence
May 31, 2026
Merged

fix(physical-optimizer): make OutputRequirements idempotent#22522
kosiew merged 1 commit into
apache:mainfrom
wirybeaver:fix-output-requirements-idempotence

Conversation

@wirybeaver

@wirybeaver wirybeaver commented May 26, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Related to apache/datafusion-ballista#1359

Rationale

Ballista's Adaptive Query Execution (AQE) planner re-invokes DataFusion's full PhysicalOptimizer chain after every completed stage (AdaptivePlanner::replan_stages). Rules that are not idempotent (rule(rule(x)) != rule(x)) stack execution-plan nodes on each pass.

OutputRequirements::new_add_mode() wraps the plan root with OutputRequirementExec to preserve global ordering/distribution requirements. On a second pass the wrapper's maintains_input_order() == [true] and required_input_ordering() == [None] cause require_top_ordering_helper to recurse through it and produce a second wrapper, yielding OutputRequirementExec(OutputRequirementExec(...)). Each AQE replan adds another layer.

What changes are included in this PR?

  • Guard in require_top_ordering(): if the plan root is already an OutputRequirementExec, return it unchanged. This makes the rule idempotent with zero overhead for single-pass use.
  • Doc-comment update on new_add_mode() and require_top_ordering() documenting the idempotence guarantee.
  • Two tests in tests/physical_optimizer/output_requirements.rs:
    • add_mode_is_idempotent_on_bare_scan — bare ParquetExec (exercises is_changed = false path).
    • add_mode_is_idempotent_on_sorted_planSortExec → ParquetExec (exercises is_changed = true path).

Are these changes tested?

Yes. Two new tests run the rule twice on distinct fixtures and assert structural equality via get_plan_string. Both fail without the fix (double-wrapped OutputRequirementExec) and pass with it.

Are there any user-facing changes?

No. OutputRequirementExec is an internal ancillary node stripped before execution; the idempotence guard only affects re-optimization scenarios (AQE).

`OutputRequirements::new_add_mode()` was stacking a new
`OutputRequirementExec` wrapper at the top of the plan on every
invocation. `require_top_ordering_helper` descends through any operator
that maintains input order and has no hard ordering requirement, and
`OutputRequirementExec` itself qualifies — so on a second pass the
helper walked past the existing wrapper, found no SortExec/SPM beneath,
and `require_top_ordering` added a fresh wrapper above the old one.

Fix: at the start of `require_top_ordering`, return the plan unchanged
when it is already topped by an `OutputRequirementExec`. The existing
wrapper was either added by this rule's prior run (in which case it is
already correct) or inserted intentionally by the caller (in which case
we should not disturb it).

Motivation: adaptive execution in datafusion-ballista AQE
(apache/datafusion-ballista#1359) re-runs the entire `PhysicalOptimizer`
chain after every completed stage. Although the chain-level effect is
masked when `OutputRequirements::new_remove_mode()` later strips the
wrapper, the rule itself violates the idempotence property that
`PhysicalOptimizerRule`s are expected to satisfy. Surfaced by the
discovery harness added in a sibling PR.

Adds `datafusion/core/tests/physical_optimizer/output_requirements.rs`
with a focused test that invokes `new_add_mode` twice on a bare parquet
scan and asserts structural equality. Fails before this fix (two stacked
wrappers); passes after.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wirybeaver
Thanks for working on this.

Looks 👍 to me

@wirybeaver

Copy link
Copy Markdown
Contributor Author

@wirybeaver Thanks for working on this.

Looks 👍 to me

Thanks for the review!

@milenkovicm

Copy link
Copy Markdown
Contributor

Thanks @kosiew and @wirybeaver

@kosiew kosiew added this pull request to the merge queue May 31, 2026
Merged via the queue into apache:main with commit d9ea38b May 31, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants